[RISCV] Switch to sign-extended loads if possible in RISCVOptWInstrs #144703

asb · 2025-06-18T13:47:19Z

This is currently implemented as part of RISCVOptWInstrs in order to reuse hasAllNbitUsers. However a new home or further refactoring will be needed (see the end of this note).

Why prefer sign-extended loads?

LW is compressible while LWU is not.
Helps to minimise the diff vs RV32 (e.g. LWU vs LW)
Helps to minimise distracting diffs vs GCC. I see this come up frequently when comparing GCC code and in these cases it's a red herring.

Issues or open questions with this patch as stands:

Doing something at the MI level makes sense as a last resort. I wonder if for some of the cases we could be producing a sign-extended load earlier on.
RISCVOptWInstrs is a slightly awkward home. It's currently not run for RV32, which means the LHU changes can actually add new diffs vs RV32. Potentially just this one transformation could be done for RV32.
Do we want to perform additional load narrowing? With the existing code, an LD will be narrowed to LW if only the lower bits are needed. The example that made look at extending load normalisation in the first place was an LWU that immediately has a small mask applied, which could be narrowed to LB. It's not clear this has any real benefit.
As I've put the specific home of this change as an open question, I haven't added MIR tests for RISCVOptWInstrs. I'll add that if we decide this is the right home.

The first version of the patch that included LBU->LB conversion changed 95k instructions across a compile of llvm-test-suite (including SPEC 2017). As was pointed out, Zcb has C_LH and C_LHU but only C_LBU (and there's also no C_LB in the baseline C). Therefore, we avoid converting C_LBU to C_LB as it i less compressible. This leaves us with ~36k instructions changed across a compile of llvm-test-suite.

This is currently implemented as part of RISCVOptWInstrs in order to reuse hasAllNbitUsers. However a new home or further refactoring will be needed (see the end of this note). Why prefer sign-extended loads? * Sign-extending loads are more compressible. There is no compressed LWU, and LBU and LHU are only available in Zcb. * Helps to minimise the diff vs RV32 (e.g. LWU vs LW) * Helps to minimise distracting diffs vs GCC. I see this come up frequently when comparing GCC code and in these cases it's a red herring. Issues or open questions with this patch as stands: * Doing something at the MI level makes sense as a last resort. I wonder if for some of the cases we could be producing a sign-extended load earlier on. * RISCVOptWInstrs is a slightly awkward home. It's currently not run for RV32, which means the LBU/LHU changes can actually add new diffs vs RV32. Potentially just this one transformation could be done for RV32. * Do we want to perform additional load narrowing? With the existing code, an LD will be narrowed to LW if only the lower bits are needed. The example that made look at extending load normalisation in the first place was an LWU that immediately has a small mask applied, which could be narrowed to LB. It's not clear this has any real benefit. This patch changes 95k instructions across a compile of llvm-test-suite (including SPEC 2017), and all tests complete successfully afterwards.

llvmbot · 2025-06-18T13:47:54Z

@llvm/pr-subscribers-backend-risc-v

Author: Alex Bradbury (asb)

Changes

This is currently implemented as part of RISCVOptWInstrs in order to reuse hasAllNbitUsers. However a new home or further refactoring will be needed (see the end of this note).

Why prefer sign-extended loads?

Sign-extending loads are more compressible. There is no compressed LWU, and LBU and LHU are only available in Zcb.
Helps to minimise the diff vs RV32 (e.g. LWU vs LW)
Helps to minimise distracting diffs vs GCC. I see this come up frequently when comparing GCC code and in these cases it's a red herring.

Issues or open questions with this patch as stands:

Doing something at the MI level makes sense as a last resort. I wonder if for some of the cases we could be producing a sign-extended load earlier on.
RISCVOptWInstrs is a slightly awkward home. It's currently not run for RV32, which means the LBU/LHU changes can actually add new diffs vs RV32. Potentially just this one transformation could be done for RV32.
Do we want to perform additional load narrowing? With the existing code, an LD will be narrowed to LW if only the lower bits are needed. The example that made look at extending load normalisation in the first place was an LWU that immediately has a small mask applied, which could be narrowed to LB. It's not clear this has any real benefit.

This patch changes 95k instructions across a compile of llvm-test-suite (including SPEC 2017), and all tests complete successfully afterwards.

Patch is 468.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144703.diff

58 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp (+45)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll (+5-11)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll (+5-11)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll (+96-96)
(modified) llvm/test/CodeGen/RISCV/atomic-signext.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/atomicrmw-cond-sub-clamp.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/bf16-promote.ll (+30-15)
(modified) llvm/test/CodeGen/RISCV/bfloat-convert.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/bfloat.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/double-convert-strict.ll (+6-12)
(modified) llvm/test/CodeGen/RISCV/double-convert.ll (+6-12)
(modified) llvm/test/CodeGen/RISCV/float-convert-strict.ll (+10-22)
(modified) llvm/test/CodeGen/RISCV/float-convert.ll (+10-22)
(modified) llvm/test/CodeGen/RISCV/fold-mem-offset.ll (+132-65)
(modified) llvm/test/CodeGen/RISCV/half-arith.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/half-convert-strict.ll (+15-27)
(modified) llvm/test/CodeGen/RISCV/half-convert.ll (+21-39)
(modified) llvm/test/CodeGen/RISCV/hoist-global-addr-base.ll (+15-7)
(modified) llvm/test/CodeGen/RISCV/local-stack-slot-allocation.ll (+5-5)
(modified) llvm/test/CodeGen/RISCV/mem64.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/memcmp-optsize.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/memcmp.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/memcpy-inline.ll (+94-94)
(modified) llvm/test/CodeGen/RISCV/memcpy.ll (+32-32)
(modified) llvm/test/CodeGen/RISCV/memmove.ll (+35-35)
(modified) llvm/test/CodeGen/RISCV/nontemporal.ll (+160-160)
(modified) llvm/test/CodeGen/RISCV/prefer-w-inst.mir (+2-2)
(modified) llvm/test/CodeGen/RISCV/rv64zbb.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rv64zbkb.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/expandload.ll (+512-512)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll (+33-15)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll (+97-97)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-vrgather.ll (+32-15)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-mask-load-store.ll (-23)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+87-87)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int.ll (+17-8)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll (+5-5)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-vpload.ll (+2-7)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwaddu.ll (+31-15)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulsu.ll (+31-15)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulu.ll (-14)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsubu.ll (+35-17)
(modified) llvm/test/CodeGen/RISCV/rvv/memcpy-inline.ll (+13-13)
(modified) llvm/test/CodeGen/RISCV/rvv/stores-of-loads-merging.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/strided-vpload.ll (+15-6)
(modified) llvm/test/CodeGen/RISCV/srem-seteq-illegal-types.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/stack-clash-prologue.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/unaligned-load-store.ll (+13-13)
(modified) llvm/test/CodeGen/RISCV/urem-seteq-illegal-types.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/urem-vector-lkk.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/wide-scalar-shift-by-byte-multiple-legalization.ll (+132-132)
(modified) llvm/test/CodeGen/RISCV/wide-scalar-shift-legalization.ll (+72-72)
(modified) llvm/test/CodeGen/RISCV/zdinx-boundary-check.ll (+3-3)

diff --git a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
index ed61236415ccf..a141bae55b70f 100644
--- a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
+++ b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
@@ -71,6 +71,8 @@ class RISCVOptWInstrs : public MachineFunctionPass {
                       const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
   bool appendWSuffixes(MachineFunction &MF, const RISCVInstrInfo &TII,
                        const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
+  bool convertZExtLoads(MachineFunction &MF, const RISCVInstrInfo &TII,
+                       const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
     AU.setPreservesCFG();
@@ -788,6 +790,47 @@ bool RISCVOptWInstrs::appendWSuffixes(MachineFunction &MF,
   return MadeChange;
 }
 
+bool RISCVOptWInstrs::convertZExtLoads(MachineFunction &MF,
+                                      const RISCVInstrInfo &TII,
+                                      const RISCVSubtarget &ST,
+                                      MachineRegisterInfo &MRI) {
+  bool MadeChange = false;
+  for (MachineBasicBlock &MBB : MF) {
+    for (MachineInstr &MI : MBB) {
+      unsigned WOpc;
+      int UsersWidth;
+      switch (MI.getOpcode()) {
+      default:
+        continue;
+      case RISCV::LBU:
+        WOpc = RISCV::LB;
+        UsersWidth = 8;
+        break;
+      case RISCV::LHU:
+        WOpc = RISCV::LH;
+        UsersWidth = 16;
+        break;
+      case RISCV::LWU:
+        WOpc = RISCV::LW;
+        UsersWidth = 32;
+        break;
+      }
+
+      if (hasAllNBitUsers(MI, ST, MRI, UsersWidth)) {
+        LLVM_DEBUG(dbgs() << "Replacing " << MI);
+        MI.setDesc(TII.get(WOpc));
+        MI.clearFlag(MachineInstr::MIFlag::NoSWrap);
+        MI.clearFlag(MachineInstr::MIFlag::NoUWrap);
+        MI.clearFlag(MachineInstr::MIFlag::IsExact);
+        LLVM_DEBUG(dbgs() << "     with " << MI);
+        MadeChange = true;
+      }
+    }
+  }
+
+  return MadeChange;
+}
+
 bool RISCVOptWInstrs::runOnMachineFunction(MachineFunction &MF) {
   if (skipFunction(MF.getFunction()))
     return false;
@@ -808,5 +851,7 @@ bool RISCVOptWInstrs::runOnMachineFunction(MachineFunction &MF) {
   if (ST.preferWInst())
     MadeChange |= appendWSuffixes(MF, TII, ST, MRI);
 
+  MadeChange |= convertZExtLoads(MF, TII, ST, MRI);
+
   return MadeChange;
 }
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll b/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
index a49e94f4bc910..620c5ecc6c1e7 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
@@ -246,17 +246,11 @@ define double @fcvt_d_wu(i32 %a) nounwind {
 }
 
 define double @fcvt_d_wu_load(ptr %p) nounwind {
-; RV32IFD-LABEL: fcvt_d_wu_load:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    lw a0, 0(a0)
-; RV32IFD-NEXT:    fcvt.d.wu fa0, a0
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: fcvt_d_wu_load:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    lwu a0, 0(a0)
-; RV64IFD-NEXT:    fcvt.d.wu fa0, a0
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: fcvt_d_wu_load:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    lw a0, 0(a0)
+; CHECKIFD-NEXT:    fcvt.d.wu fa0, a0
+; CHECKIFD-NEXT:    ret
 ;
 ; RV32I-LABEL: fcvt_d_wu_load:
 ; RV32I:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll b/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
index fa093623dd6f8..bbea7929a304e 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
@@ -232,17 +232,11 @@ define float @fcvt_s_wu(i32 %a) nounwind {
 }
 
 define float @fcvt_s_wu_load(ptr %p) nounwind {
-; RV32IF-LABEL: fcvt_s_wu_load:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    lw a0, 0(a0)
-; RV32IF-NEXT:    fcvt.s.wu fa0, a0
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: fcvt_s_wu_load:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    lwu a0, 0(a0)
-; RV64IF-NEXT:    fcvt.s.wu fa0, a0
-; RV64IF-NEXT:    ret
+; CHECKIF-LABEL: fcvt_s_wu_load:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    lw a0, 0(a0)
+; CHECKIF-NEXT:    fcvt.s.wu fa0, a0
+; CHECKIF-NEXT:    ret
 ;
 ; RV32I-LABEL: fcvt_s_wu_load:
 ; RV32I:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
index 9690302552090..65838f51fc920 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
@@ -748,7 +748,7 @@ define signext i32 @ctpop_i32_load(ptr %p) nounwind {
 ;
 ; RV64ZBB-LABEL: ctpop_i32_load:
 ; RV64ZBB:       # %bb.0:
-; RV64ZBB-NEXT:    lwu a0, 0(a0)
+; RV64ZBB-NEXT:    lw a0, 0(a0)
 ; RV64ZBB-NEXT:    cpopw a0, a0
 ; RV64ZBB-NEXT:    ret
   %a = load i32, ptr %p
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
index cd59c9e01806d..ba058ca0b500a 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
@@ -114,7 +114,7 @@ define i64 @pack_i64_2(i32 signext %a, i32 signext %b) nounwind {
 define i64 @pack_i64_3(ptr %0, ptr %1) {
 ; RV64I-LABEL: pack_i64_3:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    lwu a0, 0(a0)
+; RV64I-NEXT:    lw a0, 0(a0)
 ; RV64I-NEXT:    lwu a1, 0(a1)
 ; RV64I-NEXT:    slli a0, a0, 32
 ; RV64I-NEXT:    or a0, a0, a1
@@ -122,8 +122,8 @@ define i64 @pack_i64_3(ptr %0, ptr %1) {
 ;
 ; RV64ZBKB-LABEL: pack_i64_3:
 ; RV64ZBKB:       # %bb.0:
-; RV64ZBKB-NEXT:    lwu a0, 0(a0)
-; RV64ZBKB-NEXT:    lwu a1, 0(a1)
+; RV64ZBKB-NEXT:    lw a0, 0(a0)
+; RV64ZBKB-NEXT:    lw a1, 0(a1)
 ; RV64ZBKB-NEXT:    pack a0, a1, a0
 ; RV64ZBKB-NEXT:    ret
   %3 = load i32, ptr %0, align 4
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll b/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
index 69519c00f88ea..27c6d0240f987 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
@@ -8,13 +8,13 @@ define void @lshr_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -85,13 +85,13 @@ define void @shl_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -162,13 +162,13 @@ define void @ashr_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -244,25 +244,25 @@ define void @lshr_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -395,25 +395,25 @@ define void @shl_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -541,25 +541,25 @@ define void @ashr_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -695,7 +695,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -707,7 +707,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -729,7 +729,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1028,7 +1028,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1040,7 +1040,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1062,7 +1062,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1361,7 +1361,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1373,7 +1373,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1395,7 +1395,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1690,7 +1690,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1702,7 +1702,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1724,7 +1724,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2020,7 +2020,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2032,7 +2032,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2054,7 +2054,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2353,7 +2353,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2365,7 +2365,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2387,7 +2387,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2697,7 +2697,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2705,7 +2705,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s0, 12(a0)
 ; RV64I-NEXT:    lbu s1, 13(a0)
 ; RV64I-NEXT:    lbu s2, 14(a0)
-; RV64I-NEXT:    lbu s3, 15(a0)
+; RV64I-NEXT:    lb s3, 15(a0)
 ; RV64I-NEXT:    lbu s4, 16(a0)
 ; RV64I-NEXT:    lbu s5, 17(a0)
 ; RV64I-NEXT:    lbu s6, 18(a0)
@@ -2719,7 +2719,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s8, 20(a0)
 ; RV64I-NEXT:    lbu s9, 21(a0)
 ; RV64I-NEXT:    lbu s10, 22(a0)
-; RV64I-NEXT:    lbu s11, 23(a0)
+; RV64I-NEXT:    lb s11, 23(a0)
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
 ; RV64I-NEXT:    slli t6, t6, 8
@@ -2741,7 +2741,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s2, 28(a0)
 ; RV64I-NEXT:    lbu s3, 29(a0)
 ; RV64I-NEXT:    lbu s4, 30(a0)
-; RV64I-NEXT:    lbu a0, 31(a0)
+; RV64I-NEXT:    lb a0, 31(a0)
 ; RV64I-NEXT:    slli s9, s9, 8
 ; RV64I-NEXT:    slli s11, s11, 8
 ; RV64I-NEXT:    slli t6, t6, 8
@@ -2763,7 +2763,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a0, 4(a1)
 ; RV64I-NEXT:    lbu s1, 5(a1)
 ; RV64I-NEXT:    lbu s4, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli s8, s8, 8
 ; RV64I-NEXT:    or s7, s8, s7
 ; RV64I-NEXT:    slli s1, s1, 8
@@ -3621,7 +3621,7 @@ define void @lshr_32bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -3629,7 +3629,7 @@ define void @lshr_32bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu s0, 12(a0)
 ; RV64I-NEXT:    lbu s1, 13(a0)
 ; RV64I-NEXT:    lbu s2, 14(a0)
-; RV64I-NEXT:    lbu s3, 15(a0)
+; RV64I-NEXT:    lb s3, 15(a0)
 ; RV64I-NEXT:    lbu s4, 16(a0)
 ; RV64I-NEXT:    ...
[truncated]

llvmbot · 2025-06-18T13:47:55Z

@llvm/pr-subscribers-llvm-globalisel

Author: Alex Bradbury (asb)

Changes

This is currently implemented as part of RISCVOptWInstrs in order to reuse hasAllNbitUsers. However a new home or further refactoring will be needed (see the end of this note).

Why prefer sign-extended loads?

Sign-extending loads are more compressible. There is no compressed LWU, and LBU and LHU are only available in Zcb.
Helps to minimise the diff vs RV32 (e.g. LWU vs LW)
Helps to minimise distracting diffs vs GCC. I see this come up frequently when comparing GCC code and in these cases it's a red herring.

Issues or open questions with this patch as stands:

Doing something at the MI level makes sense as a last resort. I wonder if for some of the cases we could be producing a sign-extended load earlier on.
RISCVOptWInstrs is a slightly awkward home. It's currently not run for RV32, which means the LBU/LHU changes can actually add new diffs vs RV32. Potentially just this one transformation could be done for RV32.
Do we want to perform additional load narrowing? With the existing code, an LD will be narrowed to LW if only the lower bits are needed. The example that made look at extending load normalisation in the first place was an LWU that immediately has a small mask applied, which could be narrowed to LB. It's not clear this has any real benefit.

This patch changes 95k instructions across a compile of llvm-test-suite (including SPEC 2017), and all tests complete successfully afterwards.

Patch is 468.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144703.diff

58 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp (+45)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll (+5-11)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll (+5-11)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll (+96-96)
(modified) llvm/test/CodeGen/RISCV/atomic-signext.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/atomicrmw-cond-sub-clamp.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/bf16-promote.ll (+30-15)
(modified) llvm/test/CodeGen/RISCV/bfloat-convert.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/bfloat.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/double-convert-strict.ll (+6-12)
(modified) llvm/test/CodeGen/RISCV/double-convert.ll (+6-12)
(modified) llvm/test/CodeGen/RISCV/float-convert-strict.ll (+10-22)
(modified) llvm/test/CodeGen/RISCV/float-convert.ll (+10-22)
(modified) llvm/test/CodeGen/RISCV/fold-mem-offset.ll (+132-65)
(modified) llvm/test/CodeGen/RISCV/half-arith.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/half-convert-strict.ll (+15-27)
(modified) llvm/test/CodeGen/RISCV/half-convert.ll (+21-39)
(modified) llvm/test/CodeGen/RISCV/hoist-global-addr-base.ll (+15-7)
(modified) llvm/test/CodeGen/RISCV/local-stack-slot-allocation.ll (+5-5)
(modified) llvm/test/CodeGen/RISCV/mem64.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/memcmp-optsize.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/memcmp.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/memcpy-inline.ll (+94-94)
(modified) llvm/test/CodeGen/RISCV/memcpy.ll (+32-32)
(modified) llvm/test/CodeGen/RISCV/memmove.ll (+35-35)
(modified) llvm/test/CodeGen/RISCV/nontemporal.ll (+160-160)
(modified) llvm/test/CodeGen/RISCV/prefer-w-inst.mir (+2-2)
(modified) llvm/test/CodeGen/RISCV/rv64zbb.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rv64zbkb.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/expandload.ll (+512-512)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll (+33-15)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll (+97-97)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-vrgather.ll (+32-15)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-mask-load-store.ll (-23)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+87-87)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int.ll (+17-8)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll (+5-5)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-vpload.ll (+2-7)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwaddu.ll (+31-15)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulsu.ll (+31-15)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulu.ll (-14)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsubu.ll (+35-17)
(modified) llvm/test/CodeGen/RISCV/rvv/memcpy-inline.ll (+13-13)
(modified) llvm/test/CodeGen/RISCV/rvv/stores-of-loads-merging.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/strided-vpload.ll (+15-6)
(modified) llvm/test/CodeGen/RISCV/srem-seteq-illegal-types.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/stack-clash-prologue.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/unaligned-load-store.ll (+13-13)
(modified) llvm/test/CodeGen/RISCV/urem-seteq-illegal-types.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/urem-vector-lkk.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/wide-scalar-shift-by-byte-multiple-legalization.ll (+132-132)
(modified) llvm/test/CodeGen/RISCV/wide-scalar-shift-legalization.ll (+72-72)
(modified) llvm/test/CodeGen/RISCV/zdinx-boundary-check.ll (+3-3)

diff --git a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
index ed61236415ccf..a141bae55b70f 100644
--- a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
+++ b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
@@ -71,6 +71,8 @@ class RISCVOptWInstrs : public MachineFunctionPass {
                       const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
   bool appendWSuffixes(MachineFunction &MF, const RISCVInstrInfo &TII,
                        const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
+  bool convertZExtLoads(MachineFunction &MF, const RISCVInstrInfo &TII,
+                       const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
     AU.setPreservesCFG();
@@ -788,6 +790,47 @@ bool RISCVOptWInstrs::appendWSuffixes(MachineFunction &MF,
   return MadeChange;
 }
 
+bool RISCVOptWInstrs::convertZExtLoads(MachineFunction &MF,
+                                      const RISCVInstrInfo &TII,
+                                      const RISCVSubtarget &ST,
+                                      MachineRegisterInfo &MRI) {
+  bool MadeChange = false;
+  for (MachineBasicBlock &MBB : MF) {
+    for (MachineInstr &MI : MBB) {
+      unsigned WOpc;
+      int UsersWidth;
+      switch (MI.getOpcode()) {
+      default:
+        continue;
+      case RISCV::LBU:
+        WOpc = RISCV::LB;
+        UsersWidth = 8;
+        break;
+      case RISCV::LHU:
+        WOpc = RISCV::LH;
+        UsersWidth = 16;
+        break;
+      case RISCV::LWU:
+        WOpc = RISCV::LW;
+        UsersWidth = 32;
+        break;
+      }
+
+      if (hasAllNBitUsers(MI, ST, MRI, UsersWidth)) {
+        LLVM_DEBUG(dbgs() << "Replacing " << MI);
+        MI.setDesc(TII.get(WOpc));
+        MI.clearFlag(MachineInstr::MIFlag::NoSWrap);
+        MI.clearFlag(MachineInstr::MIFlag::NoUWrap);
+        MI.clearFlag(MachineInstr::MIFlag::IsExact);
+        LLVM_DEBUG(dbgs() << "     with " << MI);
+        MadeChange = true;
+      }
+    }
+  }
+
+  return MadeChange;
+}
+
 bool RISCVOptWInstrs::runOnMachineFunction(MachineFunction &MF) {
   if (skipFunction(MF.getFunction()))
     return false;
@@ -808,5 +851,7 @@ bool RISCVOptWInstrs::runOnMachineFunction(MachineFunction &MF) {
   if (ST.preferWInst())
     MadeChange |= appendWSuffixes(MF, TII, ST, MRI);
 
+  MadeChange |= convertZExtLoads(MF, TII, ST, MRI);
+
   return MadeChange;
 }
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll b/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
index a49e94f4bc910..620c5ecc6c1e7 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
@@ -246,17 +246,11 @@ define double @fcvt_d_wu(i32 %a) nounwind {
 }
 
 define double @fcvt_d_wu_load(ptr %p) nounwind {
-; RV32IFD-LABEL: fcvt_d_wu_load:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    lw a0, 0(a0)
-; RV32IFD-NEXT:    fcvt.d.wu fa0, a0
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: fcvt_d_wu_load:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    lwu a0, 0(a0)
-; RV64IFD-NEXT:    fcvt.d.wu fa0, a0
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: fcvt_d_wu_load:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    lw a0, 0(a0)
+; CHECKIFD-NEXT:    fcvt.d.wu fa0, a0
+; CHECKIFD-NEXT:    ret
 ;
 ; RV32I-LABEL: fcvt_d_wu_load:
 ; RV32I:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll b/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
index fa093623dd6f8..bbea7929a304e 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
@@ -232,17 +232,11 @@ define float @fcvt_s_wu(i32 %a) nounwind {
 }
 
 define float @fcvt_s_wu_load(ptr %p) nounwind {
-; RV32IF-LABEL: fcvt_s_wu_load:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    lw a0, 0(a0)
-; RV32IF-NEXT:    fcvt.s.wu fa0, a0
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: fcvt_s_wu_load:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    lwu a0, 0(a0)
-; RV64IF-NEXT:    fcvt.s.wu fa0, a0
-; RV64IF-NEXT:    ret
+; CHECKIF-LABEL: fcvt_s_wu_load:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    lw a0, 0(a0)
+; CHECKIF-NEXT:    fcvt.s.wu fa0, a0
+; CHECKIF-NEXT:    ret
 ;
 ; RV32I-LABEL: fcvt_s_wu_load:
 ; RV32I:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
index 9690302552090..65838f51fc920 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
@@ -748,7 +748,7 @@ define signext i32 @ctpop_i32_load(ptr %p) nounwind {
 ;
 ; RV64ZBB-LABEL: ctpop_i32_load:
 ; RV64ZBB:       # %bb.0:
-; RV64ZBB-NEXT:    lwu a0, 0(a0)
+; RV64ZBB-NEXT:    lw a0, 0(a0)
 ; RV64ZBB-NEXT:    cpopw a0, a0
 ; RV64ZBB-NEXT:    ret
   %a = load i32, ptr %p
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
index cd59c9e01806d..ba058ca0b500a 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
@@ -114,7 +114,7 @@ define i64 @pack_i64_2(i32 signext %a, i32 signext %b) nounwind {
 define i64 @pack_i64_3(ptr %0, ptr %1) {
 ; RV64I-LABEL: pack_i64_3:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    lwu a0, 0(a0)
+; RV64I-NEXT:    lw a0, 0(a0)
 ; RV64I-NEXT:    lwu a1, 0(a1)
 ; RV64I-NEXT:    slli a0, a0, 32
 ; RV64I-NEXT:    or a0, a0, a1
@@ -122,8 +122,8 @@ define i64 @pack_i64_3(ptr %0, ptr %1) {
 ;
 ; RV64ZBKB-LABEL: pack_i64_3:
 ; RV64ZBKB:       # %bb.0:
-; RV64ZBKB-NEXT:    lwu a0, 0(a0)
-; RV64ZBKB-NEXT:    lwu a1, 0(a1)
+; RV64ZBKB-NEXT:    lw a0, 0(a0)
+; RV64ZBKB-NEXT:    lw a1, 0(a1)
 ; RV64ZBKB-NEXT:    pack a0, a1, a0
 ; RV64ZBKB-NEXT:    ret
   %3 = load i32, ptr %0, align 4
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll b/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
index 69519c00f88ea..27c6d0240f987 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
@@ -8,13 +8,13 @@ define void @lshr_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -85,13 +85,13 @@ define void @shl_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -162,13 +162,13 @@ define void @ashr_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -244,25 +244,25 @@ define void @lshr_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -395,25 +395,25 @@ define void @shl_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -541,25 +541,25 @@ define void @ashr_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -695,7 +695,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -707,7 +707,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -729,7 +729,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1028,7 +1028,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1040,7 +1040,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1062,7 +1062,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1361,7 +1361,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1373,7 +1373,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1395,7 +1395,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1690,7 +1690,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1702,7 +1702,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1724,7 +1724,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2020,7 +2020,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2032,7 +2032,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2054,7 +2054,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2353,7 +2353,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2365,7 +2365,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2387,7 +2387,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2697,7 +2697,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2705,7 +2705,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s0, 12(a0)
 ; RV64I-NEXT:    lbu s1, 13(a0)
 ; RV64I-NEXT:    lbu s2, 14(a0)
-; RV64I-NEXT:    lbu s3, 15(a0)
+; RV64I-NEXT:    lb s3, 15(a0)
 ; RV64I-NEXT:    lbu s4, 16(a0)
 ; RV64I-NEXT:    lbu s5, 17(a0)
 ; RV64I-NEXT:    lbu s6, 18(a0)
@@ -2719,7 +2719,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s8, 20(a0)
 ; RV64I-NEXT:    lbu s9, 21(a0)
 ; RV64I-NEXT:    lbu s10, 22(a0)
-; RV64I-NEXT:    lbu s11, 23(a0)
+; RV64I-NEXT:    lb s11, 23(a0)
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
 ; RV64I-NEXT:    slli t6, t6, 8
@@ -2741,7 +2741,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s2, 28(a0)
 ; RV64I-NEXT:    lbu s3, 29(a0)
 ; RV64I-NEXT:    lbu s4, 30(a0)
-; RV64I-NEXT:    lbu a0, 31(a0)
+; RV64I-NEXT:    lb a0, 31(a0)
 ; RV64I-NEXT:    slli s9, s9, 8
 ; RV64I-NEXT:    slli s11, s11, 8
 ; RV64I-NEXT:    slli t6, t6, 8
@@ -2763,7 +2763,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a0, 4(a1)
 ; RV64I-NEXT:    lbu s1, 5(a1)
 ; RV64I-NEXT:    lbu s4, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli s8, s8, 8
 ; RV64I-NEXT:    or s7, s8, s7
 ; RV64I-NEXT:    slli s1, s1, 8
@@ -3621,7 +3621,7 @@ define void @lshr_32bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -3629,7 +3629,7 @@ define void @lshr_32bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu s0, 12(a0)
 ; RV64I-NEXT:    lbu s1, 13(a0)
 ; RV64I-NEXT:    lbu s2, 14(a0)
-; RV64I-NEXT:    lbu s3, 15(a0)
+; RV64I-NEXT:    lb s3, 15(a0)
 ; RV64I-NEXT:    lbu s4, 16(a0)
 ; RV64I-NEXT:    ...
[truncated]

github-actions · 2025-06-18T13:50:50Z

✅ With the latest revision this PR passed the C/C++ code formatter.

topperc · 2025-06-18T14:38:16Z

If I remember right, there is no c.lb in Zcb. Does this make compression worse?

asb · 2025-06-18T14:50:25Z

If I remember right, there is no c.lb in Zcb. Does this make compression worse?

You're right, I'll update to exclude LB.

EDIT: Now pushed, and patch description updated.

topperc · 2025-06-18T19:02:47Z

Have you look at doing this during instruction selection using the hasAllNbitUsers in RISCVISelDAGToDAG.cpp. That isn't restricted to RV64. Though it won't work across basic blocks.

llvm/test/CodeGen/RISCV/bf16-promote.ll

topperc · 2025-06-18T22:52:44Z

I think changing LWU->LW is useful for compression and minimizing RV32/RV64 delta.

I'm skeptical that LWU->LW and LHU->LH will help us match gcc better. Here are trivial examples where LLVM used LH/LW and gcc used LHU/LWU. https://godbolt.org/z/cbje4aT7P I guess it could be better on average, but it certainly doesn't guarantee a match to gcc.

asb · 2025-06-19T14:47:34Z

I think changing LWU->LW is useful for compression and minimizing RV32/RV64 delta.

Agreed.

I'm skeptical that LWU->LW and LHU->LH will help us match gcc better. Here are trivial examples where LLVM used LH/LW and gcc used LHU/LWU. https://godbolt.org/z/cbje4aT7P I guess it could be better on average, but it certainly doesn't guarantee a match to gcc.

That may be true. I spotted this looking at how we had some LWU that GCC didn't in a workload and wondering if it was an indicator of us overall making some different/worse choices in terms of sign/zero extension (and of course found it was just a case where either was equivalent), and I'm sure I've seen the same before. But I totally believe there are other cases where we make different choices. I'll quantify how often it kicks in, but I probably wouldn't mind dropping LHU->LH. Perhaps the argument for this kind of change is more just "canonicalisation".

Have you look at doing this during instruction selection using the hasAllNbitUsers in RISCVISelDAGToDAG.cpp

I'll try that and report back.

asb requested review from preames, topperc and wangpc-pp June 18, 2025 13:47

llvmbot added backend:RISC-V llvm:globalisel labels Jun 18, 2025

Missed formatting

1863ec3

Don't convert LBU to LB

7a5774e

lenary reviewed Jun 18, 2025

View reviewed changes

llvm/test/CodeGen/RISCV/bf16-promote.ll Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Switch to sign-extended loads if possible in RISCVOptWInstrs #144703

[RISCV] Switch to sign-extended loads if possible in RISCVOptWInstrs #144703

asb commented Jun 18, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jun 18, 2025

Uh oh!

llvmbot commented Jun 18, 2025

Uh oh!

github-actions bot commented Jun 18, 2025 •

edited

Loading

Uh oh!

topperc commented Jun 18, 2025 •

edited

Loading

Uh oh!

asb commented Jun 18, 2025 •

edited

Loading

Uh oh!

topperc commented Jun 18, 2025

Uh oh!

Uh oh!

topperc commented Jun 18, 2025

Uh oh!

asb commented Jun 19, 2025

Uh oh!

Uh oh!

[RISCV] Switch to sign-extended loads if possible in RISCVOptWInstrs #144703

Are you sure you want to change the base?

[RISCV] Switch to sign-extended loads if possible in RISCVOptWInstrs #144703

Conversation

asb commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 18, 2025

Uh oh!

llvmbot commented Jun 18, 2025

Uh oh!

github-actions bot commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

topperc commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asb commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

topperc commented Jun 18, 2025

Uh oh!

Uh oh!

topperc commented Jun 18, 2025

Uh oh!

asb commented Jun 19, 2025

Uh oh!

Uh oh!

asb commented Jun 18, 2025 •

edited

Loading

github-actions bot commented Jun 18, 2025 •

edited

Loading

topperc commented Jun 18, 2025 •

edited

Loading

asb commented Jun 18, 2025 •

edited

Loading